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Abstract 

Background: We previously developed the DBRF-MEGN (difference-based regulation finding-minimum equivalent 
g_ene network) method, which deduces the most parsimonious signed directed graphs (SDGs) consistent with 
expression profiles of single-gene deletion mutants. However, until the present study, we have not presented the 
details of the method's algorithm or a proof of the algorithm. 

Results: We describe in detail the algorithm of the DBRF-MEGN method and prove that the algorithm deduces all 
of the exact solutions of the most parsimonious SDGs consistent with expression profiles of gene deletion mutants. 

Conclusions: The DBRF-MEGN method provides all of the exact solutions of the most parsimonious SDGs 
consistent with expression profiles of gene deletion mutants. 
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Background 

Identification of gene regulatory networks (hereafter 
called gene networks) is essential for understanding cel- 
lular functions. Large-scale gene deletion projects [1-4] 
and DNA microarrays [5,6] have enabled the creation of 
large-scale gene expression profiles of gene deletion 
mutants [7,8]; these large-scale profiles comprise the 
expression levels of thousands of genes measured in 
deletion mutants of those genes. Such profiles are 
invaluable sources for identifying gene networks. Many 
procedures have been developed for inferring gene net- 
works from such profiles [9-18]. 

Kyoda et al. developed the DBRF-MEGN (difference- 
based regulation finding-minimum equivalent gene net- 
work) method, an algorithm for inferring gene networks 
from large-scale gene expression profiles of gene dele- 
tion mutants [14]. In this algorithm, gene networks are 
modeled as signed directed graphs (SDGs) in which a 
regulation between two genes is represented as a signed 
directed edge whose sign - positive or negative - repre- 
sents whether the effect of the regulation is activation or 
inhibition and whose direction represents which gene 
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regulates which other gene; the most parsimonious 
SDGs consistent with the expression profiles are thus 
deduced. Kyoda et al. showed that the method is applic- 
able to large-scale gene expression profiles of gene dele- 
tion mutants and that networks deduced by the method 
are valid and useful for predicting functions of genes 
[14]. However, details of the method's algorithm and a 
proof of the algorithm have not previously been 
published. 

Here we describe in detail the algorithm of the DBRF- 
MEGN method and prove that the algorithm provides 
all of the exact solutions of the most parsimonious gene 
networks consistent with expression profiles of gene 
deletion mutants. 

Implementation 

The software of the DBRF-MEGN method was written 
in C++ under Linux. The complete source code files, a 
binary Linux executable file, and the software manual 
are available [see Additional File 1]. 

Results 

Difference-based deduction of initially deduced edges 
and the minimum equivalent gene networks 

The DBRF-MEGN method consists of five processes, 
namely (1) difference-based deduction of initially 
deduced edges, (2) removal of non-essential edges from 
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the initially deduced edges, (3) selection of the uncov- 
ered edges in main components from the non-essential 
edges, (4) separation of the uncovered edges in main 
components into independent groups, and (5) restora- 
tion of the minimum number of edges from each inde- 
pendent group [14]. First, we define a gene network 
modeled as an SDG: 

Definition 1: A signed directed graph (SDG) is given 
by a tuple G = (V, E, j) with a set V of nodes (genes), a 
set EQ Vx V of directed edges, and an edge sign function 
f:E— >{± 1}, which is an integral part of an SDG. 

The first process of the DBRF-MEGN method is 
"difference-based deduction of initially deduced edges" 
(Figure lb), which uses an assumption that is commonly 
made in genetics and cell biology [14], i.e., there exists a 
positive (negative) regulation from gene A to gene B 
when the expression level of gene B in the deletion 
mutant of gene A is significantly lower (higher) than in 
the wild-type (Figure la). For each possible pair of genes 
in the profiles, the process determines whether positive 
(negative) regulations between those genes exist and 
deduces all edges consistent with both the assumption 
and the profiles by detecting the difference in expression 
levels between the wild type and deletion mutants; we 
call these edges initially deduced edges. 

Definition 2: Let us assume the intervention experi- 
mentshave been performed for the gene set J, J QV. Let 
E> = (d,vt)e R JxV be a matrix such thatff,^ represents the 
expression of gene /rafter an intervention in gene /' (rela- 
tive to wild-type expression). From this, we deduce the 
graph initially deduced edges, G ide = (V>Eid e >f)- We 
assume a negative regulation of k by / if djk > a for 
some suitably chosen constant a. Analogously, a positive 
regulation of k by j is postulated whenever dj^ </3 for 
some [i (sensibly, we require /3 < 0 < a). Formally, 

Eide = {{j, k) € / x V\dj k > a or d jk < B) 

and f:Eid e —>{± 1} is given by f{(j,k)) = 1 if there is a 
positive regulation of k by /', and otherwise J[(j,k)) = -1. 

The thresholds a and |3 determine the significance of 
the difference in expression levels between the wild type 
and deletion mutants. These thresholds can be specified 
by various procedures such as by using fold-change or 
the statistical significance of the expression level 
[7,8,14,19,20]. 

The DBRF-MEGN method deduces the most parsimo- 
nious SDGs consistent with the SDG that consists of 
the initially deduced edges. Before defining the most 
parsimonious SDGs, we need to introduce the function 
exp and the concept cover (Figure 2). 

Definition 3: If, and only if, 3 {i, j), (j, k), (i, k) \ fii, j) xf 
(/> k) =fii, k), then exp{i, j, k) = 1; otherwise, exp{i, j, k) = 0. 



Definition 4: Let E p Q E ide be a set of edges. Define 

E ( p 0) = E p and by induction Bp 15 = Ef u ( (j, k) e E lde \3 (j, i) , (i, k) e Ef 

such that exp(j,i,k)=l}. Moreover, let E c ™ = Ef°\ 

Remark: The family of edge sets on V is partially 
ordered by set inclusion. If £ 1 ££ 2 > note that by a trivial 
induction on r, Ef c Ef, and hence Ef = u™ 0 e< c) c u~ 0 Ef = Ef. 
This means that the mapping . cov ; E h->- E cov is monotonic. 
Let E Q E ide . By construction, an edge (j, k) from (E cov ) cov 

is an element of (fi (r) ) (s) = £ (r+s) for suitable r,s e N. This 
implies £ c °" c (e cov ) cov c ufi*> =£■*". Thus (fi cov ) cov = E cov , and 
the mapping . cov i->. £ cov is a so-called closure operation. 

Lemma 1: If E^E 2 , E\ ov c E c ° v . 

Proof: The remark proves lemma 1. 

Lemma 2: If E 1 C E c ° v , then E c ° v C E c 2 ov . 

Proof: Ef v C (E™ v ) cov = E c ° v by monotonicity and 
closure of the mapping . cov . 

Lemma 3: If E 1 C E c 2 m and E 3 c E c ° v , then E 3 c E c 2 ov . 

Proof: By E 3 c E c ° v c (E" v ) cov = E 2 m by monotoni- 
city and closure of the mapping. cov . 

Now, we define the most parsimonious SDGs consis- 
tent with the expression profiles of gene deletion 
mutants. A most parsimonious SDG consists of the mini- 
mum number of edges that "cover" all initially deduced 
edges. By this definition, an edge can be redundant only 
when it is "explained" by two other initially deduced 
edges. Importantly, an edge is not redundant when it is 
"explained" by only three or more initially deduced edges 
(Figure 3a). We call the most parsimonious SDGs mini- 
mum equivalent gene networks (MEGNs). 

Definition 5: Go = (V, Eo,/e„) (where /e 0 is the restric- 
tion of /to £o) is a most parsimonious SDG, named a 
MEGN, of G = {V,E ide ,f) if and only if it satisfies the fol- 
lowing conditions: (1) E 0 QE ide , (2) E c ° v = E ide , (3) V E p £ 
E ide such that E c p ° Y = Eid e , |£ 0 | < \E p \. Since we keep G = 
{V,E ide ,f) fixed for the rest of the paper, we often call G 0 
simply a MEGN, without explicit reference to G. 

Removal of non-essential edges from the initially 
deduced edges 

The second process of the DBRF-MEGN method 
removes all non-essential edges from the initially deduced 
edges. The process removes all edges that are explained 
by two other initially deduced edges (Figure lc). The 
resulting edges are called essential edges and the removed 
edges are called non-essential edges. 

Definition 6: If there exist (z, j), {j, k), (i, k) e E ide 
such that exp(i, j, k) = 1, then (i, k) is called a non-essen- 
tial edge. Let E nes be the set of non-essential edges. The 
set E es of essential edges is the complement of E nes in 

F F — F \F 

t-'ide) L-'es ^ide Vines' 
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Figure 1 An example of the deduction of MEGNs from the expression profiles of gene deletion mutants (a) An assumption used in the 
DBRF-MEGN method, (b) Deduction of the initially deduced edges. The matrix represents a set of expression profiles and the schematic 
represents a set of initially deduced edges. In the matrix, A, B, ... represent expression levels of gene A, gene S, and aA M ... represent 
deletion mutants of gene A, B, ... The up (down) arrows indicate that the gene expression levels are higher (lower) in the deletion mutant than 
in the wild type, (c) Essential edges. Non-essential edges are gray-dotted, (d) Uncovered edges. Uncovered edges are gray-dotted and covered 
edges are black-dotted, (e) Exclusion of uncovered edges in peripheral components. (O, /) € (N, /) € (M,/) € E^L and 
(/, /) € E^) v are uncovered edges in peripheral components. The resulting four gray-dotted edges are uncovered edges in main components, 
(f) Independent groups of uncovered edges in main components. For each group, the minimum number of edges with which essential edges 
can explain all edges in the group are shown: (f, J) or (F, J) for GO, and (H, K) or (H, L) for G1. (g) Four MEGNs of the profiles. Combinations of 
the minimum numbers of edges of two independent groups (GO and G1) produce all four MEGNs. 
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Figure 2 Introduction to the function "exp" and the concept 

"cover", (a) Initially deduced edges (f, rfe ). (b) A set of edges (f p ). (c) 
Four cases of exp(a, b, c) = 1. In each case, (a, c) is explained by (a, 
b) and (fa, c). (d) The set of edges that are covered by edges in E p 
(Ep°\ (A Q and (C, D) explain (A D); the (A D) and (ft A) explain 
(ft D). Thus, (A D) and (ft D) are covered by edges in E p . (D, £) is not 
covered by edges in E p because (D, Q cannot be explained by edges 
in E p . (e) Edges that are covered by edges in E p (black) and those 
that are not covered by edges in E p (gray-dotted). 



Essential edges and non-essential edges have the 
following properties. 

Lemma 4: If E p £ E ic i e and E c p ov 3 E es , then E p 2 E es . 

Proof: Assume that there exists (i, j) e E es such that (i, j) 
e Ep OV and (i, j) £ E p . Because (i, j) e E c ° v and (i, j) € E p , 
there exist (i, k), (k, E c p OY such that exp(i, k, j) = 1. This 
contradicts our assumption (/, j) € E es . 

Lemma 5: If G 0 = (V, E 0 ,f Eo ) is a MEGN, £ es S £ 0 . 

Proof: Eo° v = Ea e 3 E es , hence E 0 2 £ es by lemma 4. 

When the essential edges cover all initially deduced 
edges, the SDG consisting of the essential edges is the 
only MEGN consistent with the profiles. 




SDG 





c 
O 



\ 




Figure 3 Difference between MEGN and MEG. Deduction of the 
MEGN (a) and the MEG (b) from the same graph is shown. The 
MEGN includes the edge from A to D because no two edges explain 
the edge. In contrast, the MEG does not include the edge from A to 
D because A can reach D without using the edge from A to D 
04— >6-»C— >D). The MEGN consists only of the essential edges. 



Theorem 1: If E™ v = E tde , then G es = (V,E es ,f Ees ) is the 
unique MEGN of G = (V, E ide> j). 

Proof: By hypothesis, conditions (1) E es Q £ lrfe , and (2) 
E c ° v = Eide, of a MEGN are met. It remains to show the 
uniqueness and minimality of E es . (3) Let Go = (V, Eo//£ 0 ) 
be an arbitrary MEGN. Then by lemma 5, E es £ E 0 , and 
by minimality of E 0 , it follows that E es = E 0 . The theorem 
is proved. 

Selection of the uncovered edges in main components 
from the non-essential edges 

The essential edges sometime fail to cover all initially 
deduced edges because some edges in the initially deduced 
edges represent direct gene regulations even when they 
are explained by two other edges (Figure Id). In this case, 
the method restores the minimum number of non-essen- 
tial edges so that the resulting edges (essential edges and 
the restored non-essential edges) cover all initially deduced 
edges. The SDG, consisting of essential edges and of the 
restored non-essential edges, is a MEGN. Before selecting 
the sets of non-essential edges to be restored, the method 
distinguishes non-essential edges that have a chance to be 
included in the MEGNs from those that do not in order 
to reduce the number of non-essential edges to be con- 
sidered for the restoration and thus to reduce the 
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computational cost to find non-essential edges to be 
restored. This third process of the DBRF-MEGN method 
consists of two sub-processes, namely (a) selection of 
uncovered edges and (b) selection of uncovered edges in 
main components. The resulting non-essential edges are 
called uncovered edges in main components, and from 
these edges the later processes of the DBRF-MEGN 
method select edges that are included in the MEGNs. 

a) Selection of uncovered edges 

The first sub-process distinguishes the non-essential 
edges that are covered by the essential edges from those 
that are not (Figure Id). Those edges are called covered 
edges and uncovered edges, respectively. 

Definition 7: Let E cv = (E es ) cov \ E es be the set of 
covered edges. Let E ucv = E ide \( E es U E cv ) be the set of 
uncovered edges. The set of initially deduced edges is 
thereby partitioned into three disjoint edge sets: E ide = 
F U F U F 

Here, we prove that the MEGNs do not include cov- 
ered edges. 

Lemma 6: If G 0 = (V,E 0 ,f Eo ) is a MEGN, then E es £ 
Eq — E es U E ucv . 

Proof: First, E es £ E 0 by lemma 5. By definition 7, E es £ 
E 0 \E CV , hence (E 0 \E cv ) cm 2 (E 0 \E CV ) U £™ v 2 E 0 by monoto- 
nicity of. cov . It follows that (E 0 \E C „) COV 2 E™ v = E ide by 
lemma 2. By minimality of E 0 , E 0 = E 0 \ E cv , which is 
equivalent to E 0 n E cv = O. By definition 7, this implies E u 
£ E es U E ucv , completing the proof. 

b) Selection of uncovered edges in main components 

The second sub-process distinguishes uncovered edges 
that have a chance to be included in the MEGNs from 
those that do not (Figure le; Figure 4). Those edges are 
called uncovered edges in main components and uncov- 
ered edges in peripheral components. The uncovered 
edges in peripheral components are defined as follows: 

Definition 8: Define be the set of uncovered 
edges (i,j) e E ucv which cannot be used to directly 
explain another uncovered edge in E ucv with the other 
edges (k,i) e E ide or {j,k) e E ide . 

Lemma 7: (E ide \E [ °!y ov 2 E { °1 

Proof: By definition 8, the edges in E®. cannot explain 
another uncovered edges in E ucv . Therefore, the edges in 
E*® can be explained by the edges in E^ e \£^. The 
lemma is proved. 

Definition 9: Following the definition 8, define 
E ucv } = Eucv^W'i) e E ucv \ Ej& which cannot be used 
to directly explain another uncovered edge in E UCV \E^ V 
with the other edges (k,i) e E ide or (j,k) e E ide }. Let 
E p u c cv = U^ 0 E^ be the set of uncovered edges in periph- 
eral components. Let E™^ = E UCV \E^ CV be the set of uncov- 
ered edges in main components. The set of initially 
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Figure 4 Example of uncovered edges in peripheral and main 
components, (a) Initially deduced edges (£,y e ). (b) Essential edges 
(F H )- Non-essential edges are dotted, (c) Uncovered edges (F ucv ). 
Because all non-essential edges cannot be covered by essential 
edges, the non-essential edges are called uncovered edges (E ucv ). (d) 
Uncovered edges in peripheral components (gray-dotted). (6, D) 
and (6, F) are uncovered edges in peripheral components because 
(6, E) does not explain any other edges in E ucv with an edge in E ide , 
and (6, D) does not explain any other edges in E ucv except (6, F) 
with an edge in E lde . {A, Q and (6, Q are uncovered edges in main 
components. If edges in a MEGN cover {A, Q and (8, Q, the edges 
also cover (6, D) and (6, E). (8, D) and (8, E) cover no edges except 
themselves. Thus, (8, D) and (8, E) are not included in the MEGNs. 



deduced edges is thereby partitioned into four disjoint 
edge sets: E ide = E es U E cv U E™ U E p u c cv . 

In the following, we prove that the MEGNs do not 
include uncovered edges in peripheral components. 
First, we prove that uncovered edges in peripheral com- 
ponents have the following properties. 

Lemma 8: (£, de \ u^ 0 E^f™ D U^ 0 fi£i- 

Proof: We prove lemma 8 by mathematical induction. 
(1) By lemma 7, lemma 8 is true when r = 0. By 
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definitions 8 and 9, (E lde \ (E<°> U E^ v )f° v 2 hence 
(E ilie \ (£<°i U E«)) cov 2 (B«,\ (£S U E£>)) U e£ = (^(o)). 
By lemmas 2 and 7, (E ldA (e£> U E«)) c ° v 2 Thus, 
lemma 8 is true when r = 1. (2) Assume that 
lemma 8 is true when r = m. This means that we 
assume that (E ldA u£ 0 ) C ° V 2 U™ 0 E< C > (2a). By 
definition 9, (E ide \ U™ 1 E«) c ° v 2 (2b). Because 

(E lde \ U- 1 B^) C0V 2 S*\ u So 1 and ( 2b ). 

(E ide \ U™ 1 £W ) c ° v 2 EaA U™ 0 (2c). Because (2a), 
(2c) and lemma 3, (E ide \ U^ 1 E« ) c ° v 2 U™ 0 £« (2d). 
Because (2b) and (2d), (^U^tr^^ 
Thus, lemma 8 is true when r = m +1, if it is true when 
r = m. By (1) and (2), lemma 8 is true. 

Lemma 9: {E ide \E p u c cv ) cov 2 fi& 

Proof: By lemma 8, (E ide \ U~ 0 £®) cov 2 £»\ U~ 



Because U^gfiJ^ = E^y, lemma 9 is true. 



Now we prove that the MEGNs do not include uncov- 
ered edges in peripheral components. 

Lemma 10: If G 0 = (V,E Q ,f Eo ) is a MEGN, 
Eo Q E es U E™ C j,. 

Proof: Assume that there exists (i,;) e E P u CV H Eo- 
Because of lemma 5 and definition 7, 
ES^faWMlP^uj^ hence ET\(£o\ {(fcj)}) 0 " Q E ua by 
lemma 6. By the assumption j) e n Eo and defini- 
tion 8, En(£o\{M)) cov cEL, hence (£„\ |(i,j))) cov 2 f^A^L. 
By lemmas 2 and 9, (M|(v01) ot 2(^\C,) OT 3(^\b&)u^-s«. 
This contradicts our assumption that Go = (V,Eo,/e 0 ) is 
a MEGN. Therefore, E 0 n E(L = </>• Bv definition 9 and 
lemma 6, this implies Eo C E es U E™ 0 ,,, completing the 
proof. 

Separation of the uncovered edges in main components 
into independent groups and restoration of the minimum 
number of edges from each independent group 

The fourth process of the DBRF-MEGN method sepa- 
rates uncovered edges in main components into "inde- 
pendent groups" so that edges to be restored can be 
deduced independently for each group (Figure If; Figure 
5). For each group, the fifth process of the DBRF- 
MEGN method deduces the minimum number of edges 
with which essential edges can cover all edges in the 
group. All sets of such edges are deduced for each 
group. The essential edges and any possible combination 
of these sets from each group generate a MEGN of the 
profiles (Figure lg). 

The independent groups are generated so that the 
edges in one group do not cover those in other groups. 

Definition 10: Define E^ (0) be a set of 
an edge e E™ v , and by induction 



(a) 



(b) 




Figure 5 An example of independent groups, (a) Initially 
deduced edges (E ide ). (b) Essential edges (E B ). Uncovered edges in 
main components (E™ 1 ^) are dotted, (c) Independent groups of 
uncovered edges in main components. Uncovered edges in main 
components are separated into two independent groups GO (e'uci) 
and Gl (E*^- Edges in one group do not explain those in other 
group. £ £ C consists of E™ (0) = {(A, DY 
fita .jM consists of E^ B) = ( D ; G) 

and E™f >={(£, C),(H, /) 



')} 



and 
1 pmc(l) 

ucv , v 

(E, F), [k, F)\ 



{(£, G)\ 

fk F)l 



Kct l) = C5 r) U e EZ\EZ^\3 (j, k) 6 E lde , (i, fe) e 
such that exp(i, j, k) = 1 or 3 (k, i) e E ide , (k,j) 6 E™ (r) 
such that exp(k, i, j) = 1 or 3(i,fc) e E™ (r) , (fe, j) e E lde 
such that e*/?(i, k, j) = 1 or 3(i,k) e E ide , (k,j) e £™ (r) 
such that exp(i, k, j) = 1}. Let = Eri, (oo) be 
the set of edges in an independent group. 
Let E*; 1 = E^ 00 ), where E™<f> is a set of an 
edge e E™\ U s c=0 £* and by induction 

■ U j (i, j) € (O L4o E&) \*C W 13 0'. *) 6 B*. (/, fe) e Cf> 

such that e^(j, /, /c) = 1 or 3(fc,i) e E ilte , (fc,j) e E™ (r) 
such that e#p(fc i, j) = 1 or 3(i,fe) e E™ (r) , (fe, j) g £ ldf , 
such that e^(i, A, ;) = 1 or 3(i,fe) e E ide , (k,j) e E™ (r) 
such that e*p(i A, ;') = 1}. Then, U^ 0 E^ = E™. 

The essential edges and a combination of sets of the 
minimum number of edges for each independent group 
generate a MEGN of the profiles. 

Definition 11: Let E® . be the set of edges in ith 
independent group that satisfies (1) £® C £*„, (2) 
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E™ (4) = [{E, C),[H, /)}, and (3) V£ p c £«, such that 

p J & ^ I p I p 1 ^' ^ I p I 

We prove that the essential edges and a combination 
of sets of the minimum number of edges for each inde- 
pendent group generate a MEGN of the profiles as 
follows: 

Lemma 11: If there exist (z, j) e (z, k), {k, j) e 
Eide such that exp{i, k, j) = 1, then {(i, k), (k, j)} n E ucv Q 
E igi ■ 

Proof: By definition 10, lemma 11 is true. 

Lemma 12: (U»£^ min U E^ = E tde - 

Proof: By definitions 7 and 11, 

(U£)fiSLto U £ -) mV 2 U~ 0 E* U E es U E^ Because 

( U So £ uci/min U £ «) C ° V 2 £ ™v U £ <« U E cv = Eide\E P ucv> 

(USjfiLnto U £ -r V 2 £2 U E es U £ c „ = E^L- By 
lemmas 2 and 8, (U~ 0 £* min U E es ) cov 2 figy Therefore, 
A J°° F I& I J F "l cov - F-j 

Theorem 2: G min = (V, E rain = U~ £^ min U E es ,/ Bm J 
is a MEGN. 

Proof: (1) (U™ 0 E% vmin U E es ) C E^ by the condition of 
theorem 2.(2) By lemma 12, (U» £* mill U E ffl ) cov = E ide . 
(3) By lemmas 4, 11 and definition 11, V E p £ £ We 

such that E™ v 2 U~ 0 E* U £ es , |u^ 0 E* mln U £ es | < |E p |. 

Because (U « E& U £ es ) cov 2 (U^E^ U E^J = E Me 
and lemma 2, V £ p £ £ lrfe such that Ep OV = Eide, 

|u^ 0 £uc„ m i n U £ es | ^ |£ p |. The theorem is proved. 

Remark: When there exist more than one solution 
of the minimum number of edges for independent 
groups, the SDGs each of which consists of the essen- 
tial edges and a possible combination of sets of the 
solutions for each independent group are MEGNs 
because these SDGs must satisfy the conditions in 
definition 5. 

Algorithms of the DBRF-MEGN method 

We are concerned with algorithms that are computa- 
tionally efficient for deducing MEGNs from expression 
profiles of single-gene deletion mutants. We list these in 
a form easily translatable into a computer program. 
(A1) Algorithm for deducing initially deduced edges 
double d[n][n]: gene expression profiles 

int t[n] [n] 

void dbrf) 
int i, j; 

for i = 1 to n do 
for / = 1 to n do 

if d[i]\j] <P 8ci * j then 



Am- = +1: 

else if d[i] [j] > a &z' * j then 

WW'. = -i; 

else 

MW = 0; 

The matrix d[n][n] represents the gene expression 
profiles. Each entry d[i] [j] represents the log-ratio of 
the expression of gene j in gene i deletion mutants to 
that in the wild-type. The non-zero entries of the 
resulting matrix t[n] [n] represent the initially deduced 
edges. If an entry t[i]\j] is +1 or-1, it represents a posi- 
tive or negative edge from gene i to gene /, respec- 
tively. The number of complete iterations is bounded 
by n 2 . 

(A2) Algorithm for distinguishing the essential edges from 
the non-essential edges 

int £[«][«]: initially deduced edges 
void ess_noness() 
int i, j, k; 
for ; = 1 to « do 
for i - 1 to n do 
if t[i] [/'] * 0 then 
for k = 1 to m do 
if t\j][k] * 0 8ct[i][k] * 0 8it[i\[k] = t[i]\j]x t\j][k] 

then 

check(t[i] [k]); 

The checked entries of the matrix £[«][«] represent 
non-essential edges. The unchecked non-zero entries of 
the resulted matrix £[«][«] represent essential edges. We 
created this algorithm by modifying Warshall's algo- 
rithm [21]. The number of complete iterations is 
bounded by n . 

(A3.1) Algorithm for distinguishing uncovered edges from 
covered edges 

int £[«][«]: initially deduced edges 
int e[n][n]: essential edges 
void covered_edge() 
int i, j, k; 
bool finished; 
finished : = false; 
while finished = false do 
finished : = true; 
for i = 1 to vi do 
for ; = 1 to « do 
if e[i]\j] * 0 then 
for k = 1 to n do 
if e\j][k] * o &.t[i][k] * 0 &ct[i\[k] = e[i]\j] x 

e\j][k] then 

emkv. = mm; 

check(e[i] [k]); 
finished : = false; 
The checked entries of the matrix e [«][«] represent 
covered edges. The non-zero entries of the matrix t[n] 
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[n] that differ from the non-zero entries of the resulted 
matrix e[«][«] represent uncovered edges. This algo- 
rithm iterates over the while loop to find edges in E nes 
that can be covered by the essential edges. Thus, the 
number of iterations is bounded by \E nes \ ■ n 3 - 
(A3.2) Algorithm for finding uncovered edges in peripheral 
components 

int t[n] [«] : initially deduced edges 
int u [«][»]: uncovered edges 
void peripheral_uncovered() 
int i, j, k; 
bool flag, sflag; 
flag : = false; 
while flag = false do 
flag : = true; 
for ;' = 1 to n do 
for i = 1 to n do 
if u[i]\j] * 0 then 
sflag : = false; 
for k = 1 to n do 
if t\j][k] * 0 8iu[i][k] * Q 8ct[i][k] = t[i]\j] x t\j] 

[k] then 

sflag : = true; 

if t[k][i] * 0 &w [£][/] * 0 8ct[k]\j] = t[k][i] x t[i] 

[j] then 

s/Zag : = true; 

if s/Zag = false then 

check{u [«][/]); 

yZag : = _/«/se; 

rm_checked_edge(); II set all checked entries to 0 
The entries of the resulted matrix u[n][n] that have 
been changed from +1 or -1 to 0 represent uncovered 
edges in peripheral components. The non-zero entries 
of the resulted matrix u[n][n] represent uncovered edges 
in main components. This algorithm iterates over the 
while loop to find edges in £ MCV that are to be included 
in E P uc«- Thus, the number of complete iterations is 
bounded by n 3 ■ \E UCV \- 

(A4.1) Algorithm for dividing uncovered edges in main 
components (E™J into independent groups 

int t[n] [n] : initially deduced edges 
int e [«][«]: uncovered edges in main components 
ig indgrp : independent group 
list <edge >el : edge list 
list <ig >igl : independent group list 
void independent_group{) 
int i, j; 

for i = 1 to n do 
for ;' = 1 to n do 
if e[i\\j] *0 then 
el.clear{); 
el.append(eij); 
append _group(i, j); 



indgrp.initQ; 

indgrp. set_el{el); II store edge list el in indgrp 
igl.append(indgrp); II indgrp : an independent 

group 

void append_group(int i, int /) 
int x; 

for x = 1 to n do 

if t[i][x] *0 &ct[x]\j] *0 then 
if t[i] [/] = t[i] [x] x t[x] [j] then 

if e[i][x] *0 then 

el.appendieix); 

e[i][x\: = 0; 

append _group{i, x); 
if e[x][j] *0 then 

el.append(e x j); 

e[x]\j]: = 0; 

append _group{x, j); 
if t[x][t\ *0 &t[x]\j] *0 then 
if t[x] [j] = t[x] [i] x t[i] [j] then 

if e [*][/'] *0 then 

el.append(e x j); 

e[x]\j]: = 0; 

append _group{x, j); 
if t\j][x] *0 &.t[i][x] *0 then 
if t[i\ [x] = t[i] \j] x £[/'] [x] then 

if e[i][x] *0 then 

el.appendieix); 

e[i][x]: = 0; 

append _group{i, x); 
The number of complete iterations of indepen- 
dent_group() is bounded by n 2 . The number of complete 
iterations of append_group(int, int) is bounded 
[\E™ V \ — 1) • n. Thus, the number of complete iterations 
is bounded by (|£^| - 1) • n 3 . 

(A4.2) Algorithm for finding all sets of minimum number of 
edges to be restored in each independent group 

int e[n][n]: essential edges and uncovered edges in 
peripheral components 
ig indgrp : independent group 
list <ig >igl : independent group list 
list <edge >el, tmp_el : edge list 
list <edge list >combi_el : combination of edge list 
void find_min_ig() 
int /, num_edge; 
for i = 1 to igl.sizei) do 
combi_el.clear(); el.clearQ; 

indgrp <— igl.get_ig{i); II copy the z'th indepen- 
dent group from igl 

el <— indgrp.get_el(); 
for num_edge = 1 to el.size{) do 
add_edge{num_edge, 1); 
if {combi_el.size() > 0) then 
break; 
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set_min_combi_el{i, combi_el); II store 
combi_el in the ith independent group 
void add_edge(int num_edge, int start) 
int /'; 

if start + num_edge - 1 >el.size() then 

return; 
for /' = start to eLsizeQ do 

set_edge(j); II set the entry of e[n][n] corre- 
sponding to the jth edge 

// in el to +1 or -1 according to the sign of 

the edge 

check{el.get_edge(j)); II check the jth edge in el 
if num_edge > 1 then 
add_edge{num_edge - 1, / + 1); 
else 

if (confirmQ = true) then 
tmp_el.clear(); 

set_tmp_el(); II append all checked edges to 

tmp_el 

combi_el.append{tmp_el); II tmp_el : a set of 
the minimum number of 

// edges to be restored 
reset _edge(j); II set the entry of e[n][n] corre- 
sponding to the jth edge 
// in el to 0 

uncheck{el.get_edge(j)); II uncheck the jth edge 

in el 

bool confirm{): when resulting edges e[«][w] can 
covered all edges in the group, return true. 
The number of complete iterations is bounded by 

y~l.' x y~l . j RjCi ■ (Rj — i) ■ nj, where G is the number of 

independent groups, Rj is the number of edges in the jth 
independent group, « ; is the number of genes in the jth 
independent group, and mj is the number of edges to be 
restored in the y'th independent group. 
(A5) Algorithm for deducing all MEGNs by making all 
possible combinations of sets of the minimum number of 
edges for each independent group 
int e[w][«]: essential edges 

int megn : the number of MEGNs 

list <ig >igl : independent group list 

list <edge list >tmp_combi_el : combination of edge 
list 

list <edge >tmp_el : edge list 
void megnQ 
int i; 

i : = 1; megn : = 0; 

sub_megn{i); 

if megn = 0 then 

e[n][«]: MEGN // e[#][w] represents the 
MEGN when E c ° v = E ide 
void sub_megn{int i) 
int x, y, count; 



if / >igl.size{) then 
return; 

tmp_combi_el <— get_min_combi_el(i); II copy com- 
bi_el of the ith independent 
// group 

for y = 1 to tmp_combi_el.size{) do 

tmpjel <— tmp_combi_el.get_el(y); II copy the 
yth. edge list of tmp_combi_el 

set_edges(tmp_el); II set the entries of e[w][«] 
corresponding to the edges in 

// tmp_el to +1 or -1 according to the signs of 
the edges 

if i = igl.sizeQ then 
megn++; 

e[n][n\: MEGN // e[n][n] represents a MEGN 
when Eg° v =/ 
else 

sub_megn{i + 1) 
reset _edges{tmp_el); II set the entries of e[n][n] 
corresponding to the edges in 
// tmp_el to 0 
The number of complete iterations is bounded by 

-f—rC 

[ [. l Sj> where Sj is the number of sets of minimum 

number of edges to be restored for the jth independent 
group. 

Discussion 

We have described in detail the algorithm of the DBRF- 
MEGN method and have proved that the algorithm pro- 
vides all of the exact solutions of the most parsimonious 
gene networks consistent with expression profiles of gene 
deletion mutants. The resulting gene networks, called 
MEGNs, are the most parsimonious SDGs consistent 
with an SDG that consists of the initially deduced edges. 
In graph theory, many algorithms have been developed 
for deducing the most parsimonious unsigned directed 
graphs consistent with a given unsigned directed graph; 
these graphs are called minimum equivalent graphs 
(MEGs) [22-25]. MEGN is not just an "SDG version" of 
MEG, as is explained below. Although both MEGN and 
MEG are the most parsimonious graphs of a given graph, 
the parsimoniousness of the graph is defined differently 
between these graphs. MEGN consists of the minimum 
number of edges that cover all edges of a given graph 
(initially deduced edges), whereas MEG consists of the 
minimum number of edges that retain the reachability of 
a given graph [22]. MEGNs use the cover instead of the 
reachability because a MEGN is a prediction of a gene 
network consisting only of direct gene regulations [14]. 
When positive regulations from gene A to gene B, from 
gene B to gene C, from gene C to gene D, and from gene 
A to gene D are detected and regulation from gene A to 
gene C is not detected, the regulation from gene A to 
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gene D is likely to be a direct regulation instead of an 
indirect regulation as a result of the other three regula- 
tions (Figure 3a). The use of cover makes MEGNs 
include edges representing such likely direct regulations 
(Figure 3a). In contrast, the MEGs, using reachability, do 
not include those edges (Figure 3b). Therefore, the 
DBRF-MEGN method, which deduces MEGNs, is funda- 
mentally different from algorithms that deduce MEGs or 
algorithms for transitive reduction of SDG [16-18]. 

The selection of uncovered edges in main components 
(the third process) and the generation of independent 
groups (the fourth process) make the DBRF-MEGN 
method applicable to large-scale gene expression profiles. 
Without these processes, the computational cost for find- 
ing all sets of non-essential edges to be included in the 

MEGNs is n / 1 \E na \C i ■ (\E nes \ — i) where n is the 

number of genes and m is the number of non-essential 
edges to be included in a MEGN. This computation is 
impractical for large-scale gene expression profiles because 
|B„ e! |C m increases rapidly as |E„ CS | or m increase. The selec- 
tion of uncovered edges in main components reduces the 

computational cost to 1 |£>« | C. • (|£JJ^| — i) and the 

generation of independent groups further reduces it to 

Eucvl ~ i) • n j , where t is the num- 
ber of independent groups, « ; is the number of genes in 
the y'th independent group, and mj is the number of edges 
in the jth independent group to be included in a MEGN. 

\e'uw\ and mj are usually far smaller than |E nes | and m. 

Because of these reductions of the computational cost, the 
DBRF-MEGN method successfully deduced MEGNs from 
sets of large-scale gene expression profiles [14] [see Addi- 
tional file 2, Table SI; Additional file 3]. Although there is 
no guarantee that the method will deduce MEGNs from 
any given expression profiles in an acceptable time, the 
method would most probably deduce MEGNs from most 
sets of expression profiles in an acceptable time. 

Because MEGNs are deduced from initially deduced 
edges, the accuracy of MEGNs depends on that of initi- 
ally deduced edges. The primary source for the inaccu- 
racy in initially deduced edges is the noise of the 
expression profiles. Importantly, the number of false- 
positive edges in MEGN depends more on that of fal- 
sely-detected edges than that of falsely-missed edges in 
initially deduced edges; the number of false-negative 
edges in MEGN depends more on that of falsely-missed 
edges than that of falsely-detected edges in initially 
deduced edges [see Additional file 2, Table S2; Addi- 
tional file 2, Figure SI]. These dependencies suggest 
the following guideline for the thresholds a and P (Defi- 
nition 2): when the number of false-positive edges is 
more important than that of false-negative edges in 



MEGN, a (P) should be a little higher (lower) than the 
optimal value; in contrast, when the number of false- 
negative edges is more important than that of false-posi- 
tive edges in MEGN, a (P) should be a little lower 
(higher) than the optimal value. 

The DBRF-MEGN method is applicable not only to 
gene expression profiles of deletion mutants but also to 
those of gene overexpressions and conditional knock- 
downs/knock-outs [26-28]. We cannot obtain gene 
expression profiles of deletion mutants for essential 
genes. Thus, the method cannot deduce gene networks 
including essential genes when we use gene expression 
profiles of deletion mutants. A possible solution for this 
problem is to use the expression profiles of gene overex- 
pressions or conditional knock-downs/knock-outs. 
Applications of the DBRF-MEGN method to those 
profiles will deduce gene regulations that cannot be 
deduced from gene expression profiles of gene deletion 
mutants. 

A limitation of the DBRF-MEGN method is its inabil- 
ity to deduce (1) self-regulation of genes, and (2) combi- 
natorial gene regulations such as regulation in which the 
expression of gene A is down-regulated only when both 
gene B and gene C are inactive. Self-regulation could be 
deduced by using chromatin immunoprecipitation [29]. 
Combinatorial gene regulations could be deduced by 
using the expression profiles of multiple gene deletion 
mutants [30]. Synthetic genetic arrays can systematically 
construct a collection of double-gene deletion mutants 
[31]. A combination of the DBRF-MEGN method and 
the above techniques would provide more accurate 
information about gene networks. 

When the DBRF-MEGN method is applied to gene 
expression profiles measured by using DNA microarray, 
each of the deduced edges represents regulation of one 
gene's mRNA level by another gene's activity. Therefore, 
the deduced MEGNs do not include edges that repre- 
sent post-transcriptional gene regulations although they 
play major roles in the cell. However, because the algo- 
rithm of the DBRF-MEGN method is based on logic 
that is most commonly used in genetics and cell biology 
to infer gene networks from small-scale experiments, we 
can predict post-transcriptional modulators of transcrip- 
tional activity from those MEGNs. We predicted total 
72 transcriptional regulators and 232 post-transcrip- 
tional modulators of 18 transcriptional regulators from 
the MEGNs deduced from a set of gene expression pro- 
files for 265 Saccharomyces cerevisiae genes [14]. The 
DBRF-MEGN method is applicable not only to gene 
expression profiles measured by using DNA microarray 
but also to those measured by using other technologies 
such as 2D-PAGE-MS [32] and protein chips [33]. 
MEGNs deduced from those non-DNA microarray 
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expression profiles will include edges that represent 
post-transcriptional gene regulations in the cell. 

Conclusions 

We described in detail the processes of the DBRF- 
MEGN method and proved that these processes provide 
all of the exact solutions of the most parsimonious gene 
networks consistent with the expression profiles of gene 
deletion mutants, which are called MEGNs. The DBRF- 
MEGN method provides invaluable information for 
understanding cellular functions. 

Availability and requirements 

Project name: DBRF-MEGN 
Project home page: http://so.gsc.riken.jp/dbrf-megn 
Operating system: Linux 
Programming language: C++ 
Other requirements: None 
Licence: GNU LGPL 

Any restrictions to use by non-academics: Licence 
required 

Additional material 



List of abbreviations 

DBRF: difference-based regulation finding; MEGN: minimum equivalent gene 
network; SDG: signed directed graph; MEG: minimum equivalent graph. 
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